\[ \LARGE{p(\theta | X) \propto p(X | \theta) \ p(\theta)} \]
\[ \LARGE{\text{posterior} \propto \text{likelihood} \times \text{prior}} \]
\[ \theta = \text{some value} \\ X \sim \text{some distribution} \]
\[ \theta \sim \text{some distribution} \\ X = \text{some value} \]
\[ \theta = \text{some } \color{red}{\text{unknown}} \text{ value} \\ X \sim \text{some distribution} \]
\[ \theta \sim \text{some distribution} \\ X = \text{some } \color{red}{\text{known}} \text{ value} \]
Covariance is a multivariate generalization of variance. Correlation is a normalized covariance (i.e., scaled to \([-1, 1]\)).
\[ \text{Cov}[\mathbf{x}] = \mathbf{\Sigma} = \begin{bmatrix} \text{Var}[X_1] & \text{Cov}[X_1, X_2] & \text{Cov}[X_1, X_3] \\ \text{Cov}[X_2, X_1] & \text{Var}[X_2] & \text{Cov}[X_2, X_3] \\ \text{Cov}[X_3, X_1] & \text{Cov}[X_3, X_2] & \text{Var}[X_3] \end{bmatrix} \]
Multivariate models are generalization of their univariate counterparts. Why do we need the added complexity? What is a generalized linear model?
We can express any probabilistic model (i.e., joint distribution) as a graph where conditional independence and causal structure is encoded.
::::